Having ran the Seoul Marathon in 2015, I thought this would be an interesting dataset. Primarily, I am interested to see how participants’ speed (minutes/km) changes as the marathon continues. I, like many runners, hit the proverbial ‘wall’ around kilometer 32 (mile 20), and I am expecting to see a similar result in the data.
## 'data.frame': 5616 obs. of 11 variables:
## $ Overall.Position : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender.Position : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Category.Position: int 1 1 2 2 3 4 3 4 5 5 ...
## $ Category : Factor w/ 8 levels "MFI","MFM1","MFM2",..: 8 5 5 8 5 5 8 8 8 5 ...
## $ Race.No : int 21080 14 2 21077 18 21 21078 21090 21084 12 ...
## $ Country : Factor w/ 44 levels "Australia","Bahrain",..: 20 20 10 20 10 10 20 10 10 20 ...
## $ Official.Time : Factor w/ 4377 levels "2:12:12","2:12:14",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Net.Time : Factor w/ 4270 levels "2:12:11","2:12:13",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ X10km.Time : Factor w/ 2039 levels "","0:30:34","0:30:35",..: 3 2 3 3 2 2 2 3 3 2 ...
## $ Half.Way.Time : Factor w/ 3169 levels "","1:04:48","1:04:49",..: 2 2 3 2 2 2 2 4 5 2 ...
## $ X30km.Time : Factor w/ 3740 levels "","1:33:36","1:35:04",..: 2 2 2 2 2 2 2 2 3 2 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 132.2 233.0 268.3 269.0 305.4 367.1
There’s quite a variation in finishing times. To illustrate, if the winner of the race could continue his pace (incredibly unlikely, I admit), he could finish two marathons before the average runner has finished one. Seeing that some runners took over six hours to finish their marathons, I think that they should receive something more than a participants ribbon for their excellent endurance. Looking at how close the mean and median values are, we can see the this data is rather uniformally distributed.
I expected to see African nations with the fastest times, but Italy (!?), I am very surprised to see them so high on the list. We’ll have to look for them in the next plot to see if it is just one fast runner, or a group of people representing the nation.
Italy has five runners, which is still very impressive. It’s neat to see that as the number of participants from a country increases, its median time approaches the overall median (268) (~law of large numbers).
Since Kenya was the fastest nation overall, I am interested to see the summary results of its nine runners.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 132.2 132.5 139.4 145.5 147.4 186.9
## 3%
## 186.6075
Even the slowest Kenyan runner was faster than almost 97% of other runners!
The nationalities for the twenty fastest runners:
## [1] Kenya Kenya Ethiopia Kenya Ethiopia
## [6] Ethiopia Kenya Ethiopia Ethiopia Kenya
## [11] Mongolia Kenya Kenya Hong Kong SAR Italy
## [16] Australia Ethiopia Ethiopia Ethiopia Hong Kong SAR
## 44 Levels: Australia Bahrain Brazil ... United States
The sections of the race are broken down as follows: First (0km - 10km), Second (10km - 21.1km), Third (21.1km - 30km), and Fourth (30km - 42.2km). Despite each section being of different length, I believe that we can still see accurate fluctuations in the participants’ speed. The second section is the fastest section, and the final two sections become progressively slower. The section with the slowest speed was the third, measuring at 12.74 minutes/km.
## [1] "Summary of minutes/km in first section of the marathon:"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.470 5.595 6.206 6.139 6.792 7.478
## [1] "Summary of minutes/km in second section of the marathon:"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.446 5.282 5.879 5.901 6.524 7.399
## [1] "Summary of minutes/km in third section of the marathon:"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.521 5.460 6.200 6.336 7.108 8.584
## [1] "Summary of minutes/km in fourth section of the marathon:"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.815 5.958 7.003 7.002 8.002 9.211
Looking at the data above we can see quite a few interesting things. The first section of the marathon is ran about as fast (arguably a little faster) than the third section of the race. All quantiles, expect for the tenth, slow down more in the final section of the race than the third section. By looking at the standard deviations of the runners, as expected, the fastest runners are the best at pacing themselves (i.e. they have the smallest standard deviation).